Detecting Target Objects by Natural Language Instructions Using an RGB-D Camera

نویسندگان

  • Jiatong Bao
  • Yunyi Jia
  • Yu Cheng
  • Hongru Tang
  • Ning Xi
چکیده

Controlling robots by natural language (NL) is increasingly attracting attention for its versatility, convenience and no need of extensive training for users. Grounding is a crucial challenge of this problem to enable robots to understand NL instructions from humans. This paper mainly explores the object grounding problem and concretely studies how to detect target objects by the NL instructions using an RGB-D camera in robotic manipulation applications. In particular, a simple yet robust vision algorithm is applied to segment objects of interest. With the metric information of all segmented objects, the object attributes and relations between objects are further extracted. The NL instructions that incorporate multiple cues for object specifications are parsed into domain-specific annotations. The annotations from NL and extracted information from the RGB-D camera are matched in a computational state estimation framework to search all possible object grounding states. The final grounding is accomplished by selecting the states which have the maximum probabilities. An RGB-D scene dataset associated with different groups of NL instructions based on different cognition levels of the robot are collected. Quantitative evaluations on the dataset illustrate the advantages of the proposed method. The experiments of NL controlled object manipulation and NL-based task programming using a mobile manipulator show its effectiveness and practicability in robotic applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Volumetric 3D Reconstruction and Parametric Shape Modeling from RGB-D Sequences

The recent availability of low-cost RGB-D sensors and the maturity of machine vision algorithms makes shape-based parametric modeling of 3D objects in natural environments more practical than ever before. In this paper, we investigate the use of RGB-D based modeling of natural objects using RGB-D sensors and a combination of volumetric 3D reconstruction and parametric shape modeling. We apply t...

متن کامل

Flexible Marker-based Augmented Reality Based on Estimation of Object Pose With RGB-D Sensor

Augmented Reality (AR) is one of the important technologies in the field of computer graphics and is utilized for many applications. When marker-based AR systems are utilized for applications in maintenance of plants, markers need to be allowed to be placed at any points on the target objects because of a lot of occlusions by the complicated piping network in plants. Therefore, we propose a met...

متن کامل

Augmented Reality System using a Smartphone Based on Getting a 3D Environment Model in Real-Time with an RGB-D Camera

In this paper, we propose a system to achieve Augmented Reality (AR) on a smartphone. In this system, we assume a fixed RGB-D camera connected to a server is installed in the environment, and perform AR based on the 3D environment shape got by the RGB-D camera in real-time. On the smartphone side, it sends the image captured by its camera, and receives the output image processed to perform AR b...

متن کامل

Articulated Motion Learning via Visual and Lingual Signals

In order for robots to operate effectively in homes and workplaces, they must be able to manipulate the articulated objects common to environments built for and by humans. Previous work learns kinematic models that prescribe this manipulation from visual demonstrations. Lingual signals, such as natural language descriptions and instructions, offer a complementary means of conveying knowledge of...

متن کامل

Tracking Hands and Objects for an Intelligent Video Production System

We propose a novel method for detecting hands and hand-held objects in desktop manipulation situations. In order to achieve robust tracking under few constraints, we use multiple image sensors, that is, a RGB camera, a stereo camera, and an IR camera. By using these sensors, our system realized robust tracking without the prior knowledge of an object even if there are moving people or objects i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2016